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DETAILED ACTION 
Response to Amendment 

1 . Applicant's arguments have been considered but are moot in view of the new 
ground(s) of rejection necessitated by claim amendments. 

2. Applicant's arguments have been fully considered but they are not persuasive. 
User-specific models are originally constructed for the speech recognizer (the operation 
of figure 4; particularly steps 122-124). At runtime, new speech data is received {step 
140 in figure 5) and are used to train speaker-dependent data (referring to the operation 
of figure 5; for detailed discussion of speaker adaptation, referring to col. 9, line 29 to 
col. 10, line 67). 

Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. Claims 1 -5, 7-1 2, 1 5, 1 8-24, 26-28, 30-39, 42-50, 52, and 53-66 are rejected 
under 35 U.S.C. 103(a) as being unpatentable over Saylor et al. (US 6792086) in view 
of Kuhn etal. (US 6341264). 
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5. Regarding claims 1 , 30, and 53-65, Saylor et al. disclose a voice portal hosting 
system, intended to be connected to a first voice telecommunication network in order for 
a plurality of users in said network to establish a connection with the system using voice 
equipment in support of the ordering of products and/or services from any of a plurality 
of independent value-added service providers, said system comprising: 

a memory in which a plurality of interactive voice response applications providing 
interactive response functionality is stored, each of said applications including an 
executable component for execution by said hosting system (VPAGE Database 50 in 
figure 3, voice response application includes TML, XML, VoiceXML, WML, and others in 
col. 21, lines 10-45); 

a common speech recognition module (voice to text system 62 in figure 3); 

a user identification module for identifying a user (col. 7, line 58 to col. 8, line 15); 

uploading means for independently uploading said plurality of interactive voice 
response applications, to said system in advance of any ordering of said products 
and/or services, by said plurality of independent value-added service providers (col. 20, 
line 64 to col. 21, line 45 and or referring to figure 3, content provider 70 provides 
information to VPAGE Server 22; uploading information to the central server before it 
could be retrieved by the user), and wherein the identified user interacts with one or 
more of said interactive voice response application (col. 8, lines 1-38, identified is 
allowed to access voice services), and wherein each of said interactive voice response 
applications includes an executable component for execution by said hosting system 
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(VP AGE Database 50 in figure 3, voice response application includes TML, XML, 
VoiceXML, WML, and others in col. 21, lines 10-45; these are executable components). 

Saylor et al. fail to specifically disclose means for storing a plurality of user- 
specific speech models adapted to specific users for use by the common speech 
recognition module; means for retrieving the user-specific speech model of the identified 
user from said plurality of models; and wherein said one or more interactive voice 
response applications utilize said retrieved user-specific speech models via said 
common speech recognition module for recognizing speech of the identified user; said 
user-specific speech model is further adapted to the specific user during said ordering 
of said product and/or services from any one of said service providers such that said 
further adapted model is then utilized for future ordering of products and/or services 
from any other of said service providers. 

However, Kuhn et al. teach means for storing a plurality of user-specific speech 
models adapted to specific users for use by the common speech recognition module 
(figure 7, speaker adaptation); means for retrieving the user-specific speech model of 
the identified user from said plurality of models (the operation of figure 7 and elements 
32; 34; 26 in figure 2); and wherein said one or more interactive voice response 
applications utilize said retrieved user-specific speech models via said common speech 
recognition module for recognizing speech of the identified user (the operation of figure 
7 and elements 32; 34; 26 in figure 2); and said user-specific speech model is further 
adapted to the specific user during said ordering of said product and/or services from 
any one of said service providers such that said further adapted model is then utilized 
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for future ordering of products and/or services from any other of said service providers 
(referring to the operation of figure 5 and/or col. 9, line 29-67; using input speech data to 
adapt speaker-dependent models for the user). 

Since Saylor et al. and Kuhn et al. are analogous art because they are from the 
same endeavors, it would have been obvious to one of ordinary skill in the art at the 
time of invention to modify Saylor et al. by incorporating the teaching of Kuhn et al. in 
order to improve speech recognition accuracy by using user-specific speech models. 

6. Regarding claim 50, Saylor et al. disclose a method for allowing each of a 
plurality of independent value-added service providers to set up an interactive voice 
response applications each including an executable component for execution by a voice 
portal hosting system commonly used by said plurality of valued-added service 
providers and which can be used by a plurality of users (the operation of figure 1, 
multiple users access voice services at the server having a common speech recognizer, 
and independent service providers connected to the server providing voice response 
applications), said method comprising the steps of: 

independently uploading, through a second telecommunication network, said 
interactive voice response applications to said system for providing interactive voice 
response functionality (col. 20, line 64 to col. 21, line 45 and or referring to figure 3, 
content provider 70 provides information to VPAGE Server 22); 

identifying a user calling said system (col. 7, line 58 to col. 8, line 15); 
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retrieving speech models for the speech recognizer (voice to text system 62 in 
figure 3, uses system speech recognition models to recognize speech); 

executing one or more of said voice response applications in response to the 
user calling said system (VPAGE Database 50 in figure 3, voice response application 
includes TML, XML, VoiceXML, WML, and others in col. 21, lines 10-45), said executing 
including interacting with said user via said common speech module using said 
retrieved speech model for recognizing the speech of the user (voice to text system 62 
in figure 3, uses system speech recognition models to recognize speech), wherein each 
of said interactive voice response applications includes an executable component for 
execution by said hosting system (VPAGE Database 50 in figure 3, voice response 
application includes TML, XML, VoiceXML, WML, and others in col. 21, lines 10-45; 
these are executable components). 

Saylor et al. fail to specifically disclose means for storing a plurality of user- 
specific speech models adapted to specific users for use by the common speech 
recognition module; means for retrieving the user-specific speech model of the identified 
user from said plurality of models; and wherein said one or more interactive voice 
response applications utilize said retrieved user-specific speech models via said 
common speech recognition module for recognizing speech of the identified user; said 
user-specific speech model is further adapted to the specific user during said ordering 
of said product and/or services from any one of said service providers such that said 
further adapted model is then utilized for future ordering of products and/or services 
from any other of said service providers. 
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However, Kuhn et al. teach means for storing a plurality of user-specific speech 
models adapted to specific users for use by the common speech recognition module 
(figure 7, speaker adaptation); means for retrieving the user-specific speech model of 
the identified user from said plurality of models {the operation of figure 7 and elements 
32; 34; 26 in figure 2); and wherein said one or more interactive voice response 
applications utilize said retrieved user-specific speech models via said common speech 
recognition module for recognizing speech of the identified user (the operation of figure 
7 and elements 32; 34; 26 in figure 2); and said user-specific speech model is further 
adapted to the specific user during said ordering of said product and/or services from 
any one of said service providers such that said further adapted model is then utilized 
for future ordering of products and/or services from any other of said service providers 
(referring to the operation of figure 5 and/or col. 9, line 29-67; using input speech data to 
adapt speaker-dependent models for the user). 

Since Saylor et al. and Kuhn et al. are analogous art because they are from the 
same endeavors, it would have been obvious to one of ordinary skill in the art at the 
time of invention to modify Saylor et al. by incorporating the teaching of Kuhn et al. in 
order to improve speech recognition accuracy by using user-specific speech models. 

7. Regarding claim 52, Saylor et al. disclose a voice portal hosting system allowing 
a plurality of users to establish a connection with said system using voice equipment for 
interacting with one or more of a plurality of service providers, said system comprising: 
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means for independently uploading a plurality of interactive voice response 
applications from said service provides, to said system, via a communication channel 
(col. 20, line 64 to col. 21, line 45 and or referring to figure 3, content provider 70 
provides information to VPAGE Server 22), each of said voice response applications for 
providing interactive voice response functionality for a corresponding one of said service 
providers when executed by said hosting system (VPAGE Database 50 in figure 3, 
voice response application includes TML, XML, VoiceXML, WML, and others in col. 21, 
lines 10-45); 

means for storing said plurality of interactive voice response applications 
(VPAGE Database 50 in figure 3, voice response application includes TML, XML, 
VoiceXML, WML, and others in col. 21, lines 10-45); 

a common speech recognition module (voice to text system 62 in figure 3); 

means for storing a plurality of speech models adapted to specific users for use 
by the common speech recognition module (voice to text system 62 in figure 3, uses 
system speech recognition models to recognize speech); 

a user identification module for identifying a user calling said system via another 
communication channel (col. 7, line 58 to col. 8, line 15); 

means for retrieving the speech model of the identified user from said plurality of 
models (voice to text system 62 in figure 3, uses system speech recognition models to 
recognize speech), wherein 

the identified user interacts with one or more of said interactive voice response 
applications (col. 8, lines 1-38, identified is allowed to access voice services); and 



Application/Control Number: 09/835,237 Page 9 

Art Unit: 2626 

wherein each of said interactive voice response applications includes an 
executable component for execution by said hosting system (VPAGE Database 50 in 
figure 3, voice response application includes TML, XML, VoiceXML, WML, and others in 
col. 21, lines 10-45; these are executable components). 

Saylor et al. fail to specifically disclose means for storing a plurality of user- 
specific speech models adapted to specific users for use by the common speech 
recognition module; means for retrieving the user-specific speech model of the identified 
user from said plurality of models; and wherein said one or more interactive voice 
response applications utilize said retrieved user-specific speech models via said 
common speech recognition module for recognizing speech of the identified user; said 
user-specific speech model is further adapted to the specific user during said ordering 
of said product and/or services from any one of said service providers such that said 
further adapted model is then utilized for future ordering of products and/or services 
from any other of said service providers. 

However, Kuhn et al. teach means for storing a plurality of user-specific speech 
models adapted to specific users for use by the common speech recognition module 
(figure 7, speaker adaptation); means for retrieving the user-specific speech model of 
the identified user from said plurality of models {the operation of figure 7 and elements 
32; 34; 26 in figure 2); and wherein said one or more interactive voice response 
applications utilize said retrieved user-specific speech models via said common speech 
recognition module for recognizing speech of the identified user (the operation of figure 
7 and elements 32; 34; 26 in figure 2); and said user-specific speech model is further 
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adapted to the specific user during said ordering of said product and/or services from 
any one of said service providers such that said further adapted model is then utilized 
for future ordering of products and/or services from any other of said service providers 
(referring to the operation of figure 5 and/or col. 9, line 29-67; using input speech data to 
adapt speaker-dependent models for the user). 

Since Saylor et al. and Kuhn et al. are analogous art because they are from the 
same endeavors, it would have been obvious to one of ordinary skill in the art at the 
time of invention to modify Saylor et al. by incorporating the teaching of Kuhn et al. in 
order to improve speech recognition accuracy by using user-specific speech models. 

8. Regarding claims 2-5 and 31 -35, Saylor et al. further disclose the voice portal 
hosting system, wherein said common speech recognition module comprises a common 
user profile database (col. 7, line 58 to col. 8, line 15), and wherein said common user 
profile database includes user preferences (col. 7, line 58 to col. 8, line 15), and wherein 
said user preferences include a delivery address for goods and/or services ordered with 
said value-added service providers (col. 7, line 58 to col. 8, line 15), wherein said user 
preferences include a billing address and/or preferences for goods and services ordered 
with said value-added service providers (col. 7, line 58 to col. 8, line 15), wherein said 
common speech recognition module uses user-specific speech models (col. 7, line 58 to 
col. 8, line 15, voice print authentication). 
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9. Regarding claims 20-24, 26-28, and 44-49, Saylor et al. further disclose the voice 
portal hosting system, wherein at least a plurality of said interactive voice response 
applications use a common billing module and a common clearing center for 
dispatching the collected amounts to said value-added service providers (Billing Module 
46 in figure 2), wherein said common billing module allows for the billing of transactions 
between said users and said value-added service providers on a common bill prepared 
by the operator of said voice portal hosting system (Billing Module 46 in figure 2), and 
wherein at least a plurality of said users have a deposit account on said voice portal 
hosting system which can be used for transactions with a plurality of said value-added 
service providers (Billing Module 46 in figure 2), wherein at least a plurality of said 
interactive voice response applications use a user authentication module based on an 
electronic signature and/or on biometric parameters of said users (col. 7, line 58 to col. 
8, line 15, voice print authentication), wherein said second telecommunication network 
is a TCP/IP network (col. 14, lines 5-25 and/or referring to network 20 in figures 1-3), 
wherein at least some of said interactive voice response applications are described with 
VoiceXML documents (col. 21, lines 10-45), wherein at least one free interactive voice 
response application is made available by the operator of the system (col. 21, lines 10- 
45), and wherein said free interactive voice response application includes a free 
directory assistance service (col. 36, line 53 to col. 37, line 8). 

1 0. Regarding claims 7-8 and 36, Saylor et al. fail to specifically disclose the voice 
portal hosting system, wherein said common speech recognition module uses user- 
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specific speech models, means for adapting said common speech models associated to 
a user during each dialogue between said user and each of said interactive voice 
response applications, and wherein said means for adapting said common speech 
models uses recorded users' speech samples for adapting said common speech 
models off-line. 

However, Kuhn et al. teach speech recognition module using user-specific 
speech models (figure 2, speech recognizer uses adapted speech models), means for 
adapting said common speech models associated to a user during each dialogue 
between said user and each of said interactive voice response applications (figure 7 or 
34 in figure 2), and wherein said means for adapting said common speech models uses 
recorded users' speech samples for adapting said common speech models off-line (the 
operation of figure 7). 

Since Saylor et al. and Kuhn et al. are analogous art because they are from the 
same endeavors, it would have been obvious to one of ordinary skill in the art at the 
time of invention to modify Saylor et al. by incorporating the teaching of Kuhn et al. in 
order to improve speech recognition accuracy. 

1 1 . Regarding claims 9-10, Saylor et al. fail to specifically disclose the voice portal 
hosting system of claim 1, wherein said common speech recognition module uses 
Hidden Markov Models, and further comprising a Hidden Markov Models adaptation 
module for adapting said models to said user, and wherein said Hidden Markov 
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Models adaptation module allows for an incremental adaptation of said models. 
However, Kuhn et al. teach a common speech recognition module uses Hidden Markov 
Models, and further comprising a Hidden Markov Models adaptation module for 
adapting said models to said user (figure 7), and wherein said Hidden Markov Models 
adaptation module allows for an incremental adaptation of said models (figure 7). 

Since Saylor et al. and Kuhn et al. are analogous art because they are from the 
same endeavors, it would have been obvious to one of ordinary skill in the art at the 
time of invention to modify Saylor et al. by incorporating the teaching of Kuhn et al. in 
order to improve speech recognition accuracy. 

12. Regarding claims 11-12 and 37-38, Saylor et al. fail to specifically disclose the 
voice portal hosting system, wherein said common speech recognition module uses 
user-specific language models, and means for adapting said common language models 
associated to a user during each dialogue between said user and each of said 
interactive voice response applications. However, Kuhn et al. teach a common speech 
recognition module uses user-specific language models (the operation of figure 7 and 
elements 32; 34; 26 in figure 2), and means for adapting said common language models 
associated to a user during each dialogue between said user and each of said 
interactive voice response applications (figure 7, speaker adaptation is done during 
speech dialog with the system). 

Since Saylor et al. and Kuhn et al. are analogous art because they are from the 
same endeavors, it would have been obvious to one of ordinary skill in the art at the 



Application/Control Number: 09/835,237 Page 14 

Art Unit: 2626 

time of invention to modify Saylor et al. by incorporating the teaching of Kuhn et al in 
order to improve speech recognition accuracy. 

13. Regarding claims 15, 18-19, 39, and 42-43, Saylor et al. fail to specifically 
disclose the voice portal hosting system, wherein at least a plurality of said interactive 
voice response applications use a common user identification module run on said 
system, wherein said user identification module uses a voice-based user identification 
module, wherein said common speech recognition module uses a speaker-dependant 
speech recognition algorithm, and wherein said speaker is identified by said common 
user identification module. 

However, Kuhn et al. further teach that at least a plurality of said interactive voice 
response applications use a common user identification module run on said system, 
wherein said user identification module uses a voice-based user identification module, 
wherein said common speech recognition module uses a speaker-dependant speech 
recognition algorithm, and wherein said speaker is identified by said common user 
identification module {the operation of figure 2). 

Since Saylor et al. and Kuhn are analogous art because they are from the same 
endeavors, it would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Saylor et al. by incorporating the teaching of Kuhn et al. in order to 
identify the user and the user's profile for used by the speech recognition to improve 
speaker recognition accuracy by using speech speaker-dependent codebook trained by 
users in advance. 
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14. Regarding claim 66, Saylor et al. further disclose the method of claim 65, wherein 
said adapted retrieved user-specific speech and language model is made available for 
use by all others of said interactive voice response applications of the other providers 
(speech recognizer of Saylor inherently includes speech models and language models 
in order for the recognizer to function). 

15. Claims 13-14 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Saylor et al. (US 6792086) in view of Kuhn et al. (US 6341 264), as applied to claim 1 , 
and further in view Beyda et al. (US 6487277). 

16. Regarding claims 13-14, Saylor et al. fail to specifically disclose a voice portal 
hosting system of claim 1, wherein said common speech recognition module uses 
selections previously made by said users, and wherein said selections previously made 
by said users are stored in said voice portal hosting system for improving the 
arborescence of the menus. However, Beyda et al. teach common speech recognition 
module uses selections previously made by said users, and wherein said selections 
previously made by said users are stored in said voice portal hosting system for 
improving the arborescence of the menus (see abstract). 

Since Saylor et al. and Beyda et al. are analogous art because they are from the 
same endeavors, it would have been obvious to one of ordinary skill in the art at the 
time of invention to modify Saylor et al. by incorporating the teaching of Beyda et al. in 
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order to tailor the presentation order to the needs of each individual user to improve 
system's efficiency. 

17. Claims 16-17 and 40-41 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Saylor et al. (US 6792086) in view of Kuhn et al. (US 6341264), as 
applied to claims 15 and 39, respectively, and further in view of Woods et al. (US 
6510417). 

18. Regarding claims 1 6-1 7 and 40-41 , Saylor et al. fail to specifically disclose that 
the user identification module uses an identification of the equipment used by said user 
in said first telecommunication network, and being operated by a telecom operator of 
said first telecommunication network, wherein said user identification module uses an 
identification of the equipment used by said user in said first telecommunication network 
even when said identification is not available for the other B-subscribers of said first 
telecommunication network. However, Woods et al. teach that the user identification 
module uses an identification of the equipment used by said user in said first 
telecommunication network, and being operated by a telecom operator of said first 
telecommunication network, wherein said user identification module uses an 
identification of the equipment used by said user in said first telecommunication network 
even when said identification is not available for the other B-subscribers of said first 
telecommunication network (col. 24, lines 39-41). 
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Since Saylor et al. and Woods et al. are analogous art because they are from the 
same endeavors, it would have been obvious to one of ordinary skill in the art at the 
time of invention to modify Saylor et al. by incorporating the teaching of Woods et al. in 
order to allow the system to automatically authenticate users based on their phone 
numbers by using caller-ID procedure. 

Conclusion 

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1 .136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to HUYEN X. VO whose telephone number is (571)272- 
7631 . The examiner can normally be reached on M-F, 9-5:30. 
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If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Patrick Edouard can be reached on 571-272-7603. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/Huyen X Vo/ 2/25/2008 
Primary Examiner, Art Unit 2626 



